For my data, I decided to look into what were checked out during the years 2022-2023 from the Seattle Public Library. With this data, I am trying to find what books and items were checked out the most. Not only that, I also wanted to see what different categories of things were being checked out at the library and how frequently they were being checked out. I wanted to take a look at trends/books/items from this data set because one, it gives me the most recent data I could take a look at. two, it lets me see what others are checking out more frequently than other books and items from the library, which then could help me look further into why that is the case. Is it an award winning book? Is it written by a famous Author? It lets me answer questions that I am unable to answer without this data set.
Write a summary paragraph of findings that includes the 5 values calculated from your summary information R script
For my summary, I focused on finding the highest amounts and averages. For example, I found out that the most checked out category/type of item was a BOOK. Finding this out, it led me to ask which physical book had the highest amount of checkouts from this data, which then I found was Sea of tranquility / Emily St. John Mandel. This then led me to wonder just how much is the average amount of checkouts per month. While looking more into the data set, I found out that the average checkout number per month was 3.3789608^{5}. Which in other words means that there was around 337,896 physical books being checked out per month. That is crazy! Another thing I wanted to look into while doing summaries was how many kids were checking out Dr.Seuss books. When I was a kid, I would often read and check out Seuss books myself and wanted to see if that was still the case with kids today. With further search, I found that at the Seattle Pacific Library, 744.4166667 of Dr.Seuss books were being checked out each month. I’m glad that the books I grew up with are still helping kids learn and have fun while reading books. Finally, the last thing I wanted to look at was which publisher saw the most success with their works. More people reading and checking out their books should mean that their goal for publishing their work was more succesful. So looking further into this, I found out that HarperCollins Publishers Inc. were the publishers with the most checkouts.
The Seattle Public Library created this data set
The parameters of the data are checkout months, year, type, and count, material type, title, ISBN, creator, publisher, published year, and subjects
As SPL has stated themselves, they have collected this data through multiple current and historical sources. They mention specifically for digital items, some sources they have got it from are Overdrive, Hoopla, Freegal, and more.
This data was collected because Seattle Pacific Library wanted to see different trends and rates of which items from their library were being checked out. Which were being checked out more, less, so many different trends they tried to look at with this data set.
I know that there are protests for books to be removed from libraries and other book stores because of political issues. With that in mind, were some books that were checked out removed from this data set specifically because of this reason?
Some possible limitations and challenges that this data has been that one, it only includes data from 2022-2023. Any other data outside of these years are non-existent in this data set. Yes, the amount of data in this data set is a lot. However, when we put that compared to every single checkout from every year at Seattle Pacific Library, it does not tell the whole story. For example, the highest checked-out book from this data set will not likely be the highest checked-out book all-time at Seattle Pacific Library. Two, the challenge that this data bring is that we don’t know who checked out the book. Sometimes people check out the same book multiple times. This then leads us to not being able to accurately find how many unique people have checked out the book without the risk of having duplicates of the same people in checkouts. If we had a way to count checkouts without having duplicates of the same people in it, it could lead to a different type of categories, books, or other data sets being the most checked out. It could also lead to the month where the most checkouts were seen to be different with this information. However, this data set does not have the ability to do that.
For this graph, I took a look at the checkout counts per month comparing physical and digital items. I included this chart into my report because I wanted to show how the digital era is here and that digital items are being checked out a growing amount while physical items are decreasing. The first thing I noticed with this chart was that over the months in 2022, the amount of digital checkouts saw an overall increase by the end of the year while physical copies saw a decrease. Something that also surprised me was that not once during any of the months was there a case where more physical items were checked out more than digital items at the library.
## `summarise()` has grouped output by 'Title'. You can override using the
## `.groups` argument.
## Warning in geom_point(mapping = aes(x = CheckoutMonth, y = Checkouts, color =
## Title, : Ignoring unknown aesthetics: text
For this chart, I took a look at some of Seuss’s books over the span of the year 2022 and looked at how often they were checked out per month. I looked into this data because Seuss is a well known name and these four different books by him all had different and interesting data. First off, out of the four books, I saw that the book Fox in socks was the most popular while it seemed like Happy birthday to you! was the least popular. I also noticed how the rates of the books being checked out increase and decrease go in a very steep rate. it was either checked out a lot, and than checked out way lower the next month. I grew up on Seuss books and I see that it is still being read by people in the library to this day.
## Warning in geom_col(mapping = aes(x = CheckoutMonth, y = Checkouts, fill =
## Checkouts, : Ignoring unknown aesthetics: text
For my last chart, I decided to focus on finding the total amount of checkouts for each month. I decided to make a column chart because I believed showing the visualizations next to each month would best show which month had lower or higher checkout counts each month. While analyzing this data, I found that August had the highest checkout count out of any month. While connecting all my charts, I also saw something interesting where all the numbers for the charts are down around February and April which I thought was really interesting. I also noticed that the numbers of checkouts were not going lower than 600,000, which helped show me that libraries are still up and running and are in good business.